May 29, 2020

Let’s Get Started

  • Start by Downloading the necessary packages
  • We will need gapminder, ggthemes, plotly and tidyverse
    • The main package we are using is ggplot2 but that is in the tidyverse suite

Load Libraries

library(pacman)
p_load(gapminder, ggthemes, plotly, tidyverse)

Data Visualisation with ggplot2

  • 3 key grammatical elements
    • Data: the dataset that is being plotted
    • Aesthetics:the scales onto which we map our data
    • Geometries: the visual elements used for our data
    • Every ggplot2 plot has these 3 key components

Grammatical elements

1st Layer - the data

ggplot(gapminder)

2nd layer - the aesthetics

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp))

3rd layer - the geometries

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point()

Next steps - Refining the visualisation

  • In the current scatterplot, we are visualizing 2 variables (bivariate)
    • We want to find out how lifeExp relates to gdpPercap
    • We can see that there is a generally positive linear relationship, i.e. as lifeExp increases, gdpPercap also increases holding all else constant (vice versa too)
    • But the graphic is no very informative and clear
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point()

Next steps - Refining the visualisation

Next steps - Focusing on the aesthetics

  • Annotating with graphs labels
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + 
  geom_point() +
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Tracking the relationship between GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")

Next steps - Focusing on the aesthetics

Next steps - Focusing on the aesthetics

  • Adding Colours to add another variable
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
  # Colour can help highlight a different variable
  geom_point(alpha = 0.5, size = 2) + 
  # Alpha stands for transparency, size denotes the point size
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Tracking the relationship between GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")

Next steps - Focusing on the aesthetics

Next steps - Focusing on the aesthetics

  • Adding Size to add another variable
gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, colour = continent, size = pop)) + 
  # Size denotes the population size
  geom_point(alpha = 0.5) + 
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Tracking the relationship between GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")

Next steps - Focusing on the aesthetics

Next steps - Focusing on the aesthetics

  • Facets help us break up visualisations to their individual components
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
  geom_point(alpha = 0.5, size = 2) +
  # . ~ continent tells us to facet the plot by the continent
  facet_grid(. ~ continent) +
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Tracking the relationship between GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")

Next steps - Focusing on the aesthetics

Next steps - Focusing on the aesthetics

  • Facets help us break up visualisations to their individual components
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
  geom_point(alpha = 0.5, size = 2) +
  # facet_wrap is similar to facet_grid but presents the plots in a different manner
  facet_wrap(. ~ continent) +
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Tracking the relationship between GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")

Next steps - Focusing on the aesthetics

4th layer - statistics

  • We can do some preliminary statistical analysis through the visualisations = geom_smooth evaluates the relationship between variables
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
  geom_point(alpha = 0.5, size = 2) +
  facet_grid(. ~ continent) +
  # geom_smooth allows us to plot the linear relationship between 2 variables
  # method determines the modeling method, in this case linear model
  # se determines whether we want to plot the standard error
  # lwd states the line width
  geom_smooth(color = "black", method = "lm", se = TRUE, lwd = 0.5) +
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Tracking the relationship between GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")

4th layer - statistics

4th layer - statistics

  • We can do some preliminary statistical analysis through the visualisations = geom_smooth evaluates the relationship between variables
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, colour = continent)) + 
  geom_point(alpha = 0.5, size = 2) +
  facet_grid(. ~ continent) +
  geom_smooth(color = "black", method = 'loess', se = TRUE, lwd = 0.5) +
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Tracking the relationship between GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")

4th layer - statistics

5th layer - coordinates

5th layer - coordinates

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + 
  geom_point() +
  scale_x_log10() + # Rescale the axis
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Tracking the relationship between GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")

5th layer - coordinates

6th layer - themes

  • ggthemes allow us to quickly get certain visualisations using themes
  • Check out the Examples here
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + 
  geom_point() +
  theme_solarized() + # Add Theme
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Tracking the relationship between GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")

6th layer - themes

  • Solarized theme, theme_solarized()

6th layer - themes

  • Wall Street Journal theme, theme_wsj()

6th layer - themes

  • FiveThirtyEight theme, theme_fivethirtyeight()

6th layer - themes

  • The Economist theme, theme_economist_white()

Adding Interactivity with plotly

  • Turn a ggplot into an interactive graph
plot <- gapminder %>%
  filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, colour = continent, size = pop)) + 
  # Size denotes the population size
  geom_point(alpha = 0.5) + 
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "GDP per capita and Life Expectancy",
       caption = "Source: gapminder dataset")
ggplotly(plot)

Adding Interactivity with plotly

Adding Interactivity with plotly

gapminder %>%
  plot_ly(x = ~gdpPercap, y = ~lifeExp, size = ~pop, 
          color = ~continent, frame = ~year, text = ~country,
          hoverinfo = "text", type = 'scatter', mode = 'markers') %>%
  layout(xaxis = list(type = "log"))

Univariate Plots

  • Use a Histogram or a variation of a histogram for single variable
gapminder %>%
  filter(year == 2007) %>%
  select(country, lifeExp) %>%
  ggplot(aes(x = lifeExp)) +
  geom_histogram() +
  labs(x = "life expectancy",
       y = "frequency",
       title = "Histogram of global life expectancy in 2007")

Univariate Plots

Univariate Plots

Univariate Plots

Univariate Plots

Bivariate / Multivariate Plots

  • Depends on whether the variable is discrete or continuous

    • Discrete can be boolean or categorical
    • Boolean = Yes or No, TRUE or FALSE
    • Categorical = Singapore, Hong Kong, Japan
    • Continuous = income / age / area

Bivariate / Multivariate Plots

For 1 discrete and 1 continuous, use:

  • geom_col()
  • geom_boxplot()

Bivariate / Multivariate Plots

For 2 discrete, use:

  • geom_count()

Bivariate / Multivariate Plots

For 2 continuous, use:

  • geom_line()
  • geom_area()
  • geom_step()

Try it out!

Use Coronavirus dataset found here